An investigation on scaling parameter and distance metrics in semi-supervised Fuzzy c-means

نویسندگان

Daphne Teck Ching Lai

Jonathan M. Garibaldi

چکیده

The scaling parameter α helps maintain a balance between supervised and unsupervised learning in semi-supervised Fuzzy c-Means (ssFCM). In this study, we investigated the effects of different α values, 0.1, 0.5, 1 and 10 in Pedrycz and Waletsky’s ssFCM with various amounts of labelled data, 10%, 20%, 30%, 40%, 50% and 60% and three distance metrics, Euclidean, Mahalanobis and kernel-based on the Nottingham Tenovus Breast Cancer dataset and five popular UCI datasets. Higher α values were found to produced better accuracy using Euclidean distance on four datasets out of the six datasets. For Mahalanobis distance, increasing α to improve accuracy is effective up to α = 1 and not at α = 10 in three out of six dataseets. For kernel-based distance, accuracy tend to decrease with increasing α value, which has been observed in four out of six datasets. Such trends in the effects of α values on the classification results using different distance metrics and datasets can be established to form a guide in the selection of α. Care should be taken in selection of α value as they are dependant on the distance metric, particularly the Mahalanobis and kernelbased distance metrics, and the dataset used.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating Distance Metrics in Semi-supervised Fuzzy c-Means for Breast Cancer Classification

In previous work, semi-supervised Fuzzy c-means (ssFCM) was used as an automatic classification technique to classify the Nottingham Tenovus Breast Cancer (NTBC) dataset as no method to do this currently exists. However, the results were poor when compared with semi-manual classification. It is known that the NTBC data is highly non-normal and it was suspected that this affected the poor result...

متن کامل

An exploration of improvements to semi-supervised fuzzy c-means clustering for real-world biomedical data

This thesis explores various detailed improvements to semi-supervised learning (using labelled data to guide clustering or classification of unlabelled data) with fuzzy c-means clustering (a ‘soft’ clustering technique which allows data patterns to be assigned to multiple clusters using membership values), with the primary aim of creating a semi-supervised fuzzy clustering algorithm that shows ...

متن کامل

A methodology for automatic classification of breast cancer immunohistochemical data using semi-supervised Fuzzy c-means

Previously, a semi-manual method was used to identify six novel and clinically useful classes in the Nottingham Tenovus Breast Cancer dataset. 663 out of 1076 patients were classified. The objectives of our work is three folds. Firstly, our primary objective is to use one single automatic method (post-initialisation) to reproduce the six classes for the 663 patients and to classify the remainin...

متن کامل

Enhancement of fuzzy clustering by mechanisms of partial supervision

Semi-supervised (or partial) fuzzy clustering plays an important and unique role in discovering hidden structure in data realized in presence of a certain quite limited fraction of labeled patterns. The objective of this study is to investigate and quantify the effect of various distance functions (distances) on the performance of the clustering mechanisms. The underlying goal of endowing the c...

متن کامل

Composite Kernel Optimization in Semi-Supervised Metric

Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

An investigation on scaling parameter and distance metrics in semi-supervised Fuzzy c-means

نویسندگان

چکیده

منابع مشابه

Investigating Distance Metrics in Semi-supervised Fuzzy c-Means for Breast Cancer Classification

An exploration of improvements to semi-supervised fuzzy c-means clustering for real-world biomedical data

A methodology for automatic classification of breast cancer immunohistochemical data using semi-supervised Fuzzy c-means

Enhancement of fuzzy clustering by mechanisms of partial supervision

Composite Kernel Optimization in Semi-Supervised Metric

عنوان ژورنال:

اشتراک گذاری